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In  recent  years  there  has  been  an  increased  emphasis  on  the  fore- 
casting of  earnings.  This  increased  emphasis  is  largely  a result  of  a 
widespread  recognition  that  future  earnings  is  an  important  factor  in 
investor  decision  making.  This  is  evidenced  by  the  fact  that  the 
Financial  Accounting  Standards  Board  has  cited  future  earnings  as  the 
single  most  important  variable  in  determining  a stock's  value,  and  the 
Securities  and  Exchange  Commission  has  recently  considered  requiring 
earnings  forecasts  in  external  reports. 

The  increased  emphasis  on  future  earnings  has  led  to  a complementary 
increased  emphasis  on  methods  of  predicting  future  earnings.  In  particu- 
lar considerable  attention  has  been  given  to  statistical  methodology  with 
respect  to  predicting  future  earnings.  The  reason  for  this  is  that  the 
accuracy  of  forecasts  is  largely  dependent  on  the  forecast  methodology 
employed;  and  in  particular,  if  a statistical  methodology  leads  to 


IX 


misspecified  or  suboptimal  forecasts  then  dysfunctional  or  suboptimal 
decision  making  can  result  from  the  use  of  such  forecasts. 

The  present  study  classified  and  evaluated  the  statistical  fore- 
cast methodologies  that  have  been  used  in  accounting.  It  was  demonstrated 
that  these  methodologies  employ  statistical  models  which,  due  to  inherent 
limitations,  ignore  data  that  might  have  potential  for  improved  forecasts . 
The  purpose  of  the  study  was  to  introduce  and  employ  a multivariate 
generalization  of  the  univariate  time  series  methodology  developed  by 
Box  and  Jenkins,  which  overcomes  certain  limitations  of  previously  used 
statistical  methodologies.  Specifically  this  methodology  was  generalized 
to  include  an  additional  predictor  variable  (in  addition  to  earnings 
itself).  The  additional  variable  then  was  used  to  test  the  predictive 
value  of  various  ratio,  market  and  industry  data. 

These  data  were  reduced  to  four  composite  indices  through  the  use 
of  factor  analysis.  The  four  independent  indices  then  were  tested 
individually  to  determine  their  effect  on  the  predictability  of  the 
earnings  forecast.  The  results  indicated  that  these  indices  did  not 
produce  significant  improvement  in  forecasts. 


X 


CHAPTER  1 


INTRODUCTION  AND  PROBLEM  STATEMENT 


Overview 

In  recent  years  there  has  been  an  increased  emphasis  on  the  fore- 
casting of  earnings.  This  increased  emphasis  is  largely  a result  of 
a widespread  recognition  that  future  earnings  is  an  important  factor 
in  investor  decision  making.  This  is  evidenced  by  the  fact  that  the 
Financial  Accounting  Standards  Board  (FASB)  has  made  the  importance  of 
future  earnings  a primary  consideration  in  the  theoretical  framework 
underlying  their  recently  proposed  Objectives  of  Financial  Reporting 
and  Elements  of  Financial  Statements  of  Business  Enterprises  (Financial 
Accounting  Standards  Board,  1977).  In  addition  the  Securities  and 
Exchange  Commission  (SEC)  recently  has  been  considering  requiring 
earnings  forecasts  in  external  reports  (Prakash,  Prem  and  Rappaport, 
eds.,  1974). 

The  increased  emphasis  on  future  earnings  also  has  led  to  an 
increased  emphasis  on  methods  of  predicting  future  earnings.  In 
particular,  considerable  attention  has  been  given  to  statistical 
methods  that  result  in  a predicted  future  earnings.  The  reason  for 
this  is  that  the  accuracy  of  forecasts  is  largely  dependent  on  the 
forecast  method  employed;  and  in  particular,  if  a statistical  method 
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leads  to  misspecified  or  suboptimal  forecasts  then  dysfunctional  or 
suboptimal  decision  making  can  result  from  the  use  of  such  forecasts. 

The  present  study  deals  with  the  fact  (which  is  demonstrated  below) 
that  the  statistical  forecast  methods  that  previously  have  been  used  in 
accounting  research  have  inherent  limitations  with  respect  to  the 
utilization  of  data.  Specifically,  the  statistical  methods  that 
previously  have  been  used  are  limited  because  they  ignore  data  that  have 
the  potential  to  improve  forecasts. 

The  purpose  of  the  study  is  to  explore  this  limitation  by  employing 
a more  general  approach  to  statistical  forecasting  which  incorporates 
additional  data  into  the  forecast  model.  This  approach  will  overcome 
certain  of  the  limitations  of  previously  used  statistical  forecast 
methods. 

The  research  method  used  involves  the  use  of  a transfer  function 
method  developed  by  Box  and  Jenkins  (1970).^  This  approach  incorporates 
an  additional  predictor  variable  in  addition  to  past  earnings  in  the 
form  of  an  index  constructed  from  ratio,  industry  and  market  data  via 
factor  analysis.  The  null  hypothesis  is  that  this  index  will  not 
produce  improved  forecasts  when  incorporated  in  the  basic  model. 

The  central  contribution  of  the  study  is  that  information  is 
provided  with  respect  to  the  value  of  adding  data  to  previously  employed 
models.  For  investment  decisions,  this  information  is  relevant  because 
it  contributes  to  providing  a basis  for  economic  valuation  of  the 
benefits  of  ratio,  industry  and  market  data.  The  need  for  such  valua- 
tion arises  from  the  fact  that  dataare  generally  obtained  at  a cost. 
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Rational  decisions  therefore  with  respect  to  its  purchase  can  only  be 
made  relative  to  its  benefits,  one  of  which  is  its  ability  to  improve 
forecasting. 

The  study  is  presented  in  four  chapters.  The  main  focus  of  the 
remainder  of  Chapter  I is  on  an  evaluation  of  the  statistical  forecast 
methods  that  have  been  used  in  accounting.  This  includes  a discussion 
of  certain  limitations  of  traditional  forecast  methods.  This  discus- 
sion is  followed  by  a detailed  discussion  of  the  problem  and  purpose 
of  the  study. 

In  Chapter  II,  the  research  methodology  first  is  developed.  This 
is  followed  by  a statistical  test  of  the  null  hypothesis  and  a discus- 
sion of  the  results.  In  Chapter  III,  the  entire  study  is  summarized 
followed  by  conclusions  and  suggestions  for  future  research. 

Introducti on: Recent  Developments  in  Earnings  Forecast  Research 

The  FASB,  the  SEC  and  Future  Earnings 

As  previously  mentioned,  there  has  been  an  increased  emphasis  on 
the  study  of  statistical  models  relating  to  forecasts  of  accounting 
earnings  numbers.  A primary  reason  for  this  emphasis  is  that  earnings 
forecasts,  the  output  of  these  models,  are  an  important  factor  in 
investor  decision  making.  This  emphasis  can  be  seen  from  the  FASB's 
recently  proposed  Objectives  of  Financial  Reporting  and  Elements  of 
Financial  Statements  of  Business  Enterprises  (Financial  Accounting 
Standards  Board,  1977)  which  was  based  on  the  theoretical  framework 
set  forth  in  the  Tentative  Conclusions  on  Objectives  of  Financial 


statement  of  Business  Enterprises  (Financial  Accounting  Standard  Board, 
1977).  In  the  latter  document,  the  Board  relied  on  four  propositions: 

(1)  The  primary  interest  of  the  investor  is  in  a return 
on  his  investment  in  the  form  of  cash  flows  (p.  45). 

(2)  Earnings  as  measured  by  accrual  accounting  are  generally 
thought  to  be  the  most  relevant  indicator  of  an  enter- 
prise's cash  earning  ability  (p.  45). 

(3)  Fundamental  financial  analysis  focuses  on  the  earning 
power  of  an  enterprise  in  estimating  the  intrinsic 
value  of  the  stock  (p.  57). 

(4)  The  most  important  single  factor  in  determining  a 
stock's  value  is  now  held  to  be  the  indicated  average 
future  earning  power  (p.  57). 

This  importance  of  future  earnings  also  has  been  recognized  by  the 
SEC  which  has  considered  the  requirement  that  forecasts  be  included  as 
disclosure  in  financial  statements  (Prakash,  Prem  and  Rappaport,  eds., 
1974).  In  addition,  empirical  evidence  supports  the  importance  of 
earnings  forecasts.  For  example,  in  a 1973  survey  of  the  members  of 
the  Financial  Analysts  Federation,  99  percent  of  the  respondents  claimed 
that  they  use  earnings  forecasts  in  decision  making  (Nordby,  1973; 

Lorek  et  al . , 1976) . 

The  Need  for  Statistical  Forecasting  Models 

In  order  to  forecast  future  earnings  a forecast  method  is  required. 
These  methods  may  be  based  on  subjective  decisions  or  through  the  use 
of  more  formal  methods  based  on  statistical  analysis.  This  latter 
approach  is  the  focus  of  this  research  and  henceforth  will  be  desig- 
nated as  a statistical  forecast  method  (SFM).  The  need  for  a formal 
statistical  method  is  derived  from  the  advantages  of  formal  quantitative 
analyses  such  as  conduciveness  to  objectivity  and  the  ability  to  handle 
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large  amounts  of  data.  SFMs  have  been  utilized  in  an  increased  number 
of  accounting  research  studies  in  two  main  areas: 

( 1 ) Studies  dealing  with  the  predictabi 1 i ty  or  information 
content  of  accounting  and  nonaccounting  numbers.^  A large 
number  of  studies  have  been  conducted  in  this  area.  These 
include  studies  on  the  prediction  of  business  failure 
(Beaver,  1965;  Altman,  1968;  Beaver,  1968;  Deaken,  1972), 
the  prediction  of  market  return  from  ratios  (Gonedes, 

1973)  and  the  times  series  properties  of  accounting  numbers 
(Beaver,  1970;  Ball  and  Watts,  1972;  Kennelly,  1972; 
Lookabill,  1976).  One  reason  for  the  emphasis  on  the 
predictive  properties  of  accounting  numbers  is  that 
predictability  can  be  used  as  a surrogate  for  usefulness 
(Beaver,  Kennelly  and  Voss,  1968)  which  has  been  cited 

as  the  primary  objective  of  accounting  data  (Report  of 
the  Study  Group  on  the  Objectives  of  Financial  State- 
ments, 1973;  American  Accounting  Association,  1966). 

(2)  Studies  examining  the  forecast  success  of  managers 
relative  to  statistical  models  (Green  and  Seqall,  1 966 ; 

Cragg  and  Malkiel,  1968;  Steckler,  1968;  Mincer  and 
Zarnowitz,  1969;  Copeland  and  Marioni,  1972;  Slovic,  et  al . , 
1972  and  Lorek  et  al.,  1976). 


Classification  and  Evaluation  of  Statistical  Forecast  Methods  Previously 
Used  in  Accounting  Research  with  Respect  to  Utilization  of  Data 

Introduction:  Factors  Related  to  SFM  Development 

In  general  the  SFMs  employed  in  the  above  studies  depended  on  the 
use  of  a structural  model  and  a data  set  in  the  identification,  estima- 
tion and  appl ication  of  that  model . 

Structural  Model.  In  the  present  context  a structural  model  is 
defined  as  a mathematical  model  that  describes  a variable  or  variables 
of  interest.  A structural  model  will  contain  one  predicted  variable 
expressed  as  a function  of  one  or  more  predictor  variables. 

The  above  definition  is  very  broad  and  encompasses  many  families 
of  well  known  forecast  models.  Two  primary  examples  of  such  families 
are  the  multiple  regression  and  autoregressive  integrated  moving 


average  models  (see  Box  and  Jenkins,  1970).  Each  of  these  categories 
is  referred  to  as  a family  because  in  each  case  an  infinitely  large 
number  of  models  fit  into  the  category.  For  example,  in  multiple 
regression  there  exist  the  models:  {y  = a-|X^  + b,  y = a-|X.|  + a^a^  + 

b,  y = a-jX-i  + a^X2  + + b,  ...}.  (Throughout  the  study  Y's  are 

used  as  predicted  variables,  X's  are  predictor  variables,  U's  as  error 
terms  and  other  letters  as  constants.)  Each  of  the  models  is  a struc- 
tural model.  Similarly  in  the  autoregressive  integrated  moving  average 
(ARIMA)  family  there  exists  an  infinite  number  of  possible  models  of  the 

t ^1  t-1  t-2  ^n  t-n  1 t-1  2 t-2  I t-Z 
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+ u^.  For  example,  in  the  regression  case  the  second  possible 
structural  model  is  y = a^jX-j  + a^X2  + b;  in  the  ARIMA  case  any  struc- 
tural model  is  specified  by  an  appropriate  selection  of  the  subscripts 
n and  Z.  In  particular,  the  order  of  the  auto  regressive  portion  of  the 
model  is  equal  to  n and  the  order  of  the  moving  average  portion  is 
equal  to  Z.  For  example,  a second  order  autoregressive  model  is  speci- 
fied by  a value  of  n equal  to  two  and  a value  of  Z equal  to  zero. 
Similarly,  a first  order  moving  average  model  is  consistent  with  a value 
of  n equal  to  zero  and  a value  of  Z equal  to  one. 

The  structural  models,  however,  in  both  the  ARIMA  and  multiple 
regression  families  are  limited  in  form.  This  is  true  in  spite  of  the 
fact  that  both  families  contain  an  infinite  number  of  structural  models. 
For  example,  the  simple  multiple  regression  model  Y - a^x-j  + a2X^  + b 
is  not  included  in  the  ARIMA  family;  also  none  of  the  ARIMA  models  are 
contained  in  the  multiple  regression  family.  This  is  an  important 
limitation  because  in  practice  it  is  common  for  the  forecastor  to 
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generally  begin  the  statistical  analysis  by  selecting  a particular 
family;  and  once  this  is  done  he  is  not  likely  to  select  a structural 
model  outside  of  that  family.  This  point  is  developed  in  detail  below. 

Identification.  Identification  is  defined  as  that  process  which 
is  used  to  select  a structural  model.  As  mentioned  above  the  selection 
is  generally  limited  to  a selection  of  a structural  model  within  a 
given  family. 

The  identification  process  can  be  made  with  or  without  utilizing 
data  corresponding  to  the  predicted  variable.  Models  identified  with- 
out the  use  of  such  data  are  referred  to  as  "non-data  at  hand"  (NDH) 
models.  More  precisely,  all  models  which  violate  either  of  the  follow- 
ing criteria  are  classified  as  NDH:  1)  a given  model  must  use  data  for 

identification.  2)  The  same  data  for  both  the  identification  of  the 
structural  model  and  estimation  of  the  structural  model  parameters  must 
be  used.  Therefore  there  are  two  ways  that  an  identification  can  be 
NDH,  namely:  (1)  identification  without  data  analysis  (the  first 

criterion  is  violated),  and  (2)  identification  with  data  analysis, 
while  using  two  different  data  sets  for  identification  and  estimation 
(the  second  criterion  is  violated).  Finally,  models  that  do  meet  both 
criteria  are  referred  to  as  "utilize  data  at  hand"  (UDH)  models. 

NDH  models  of  the  first  kind  occur  when  an  individual  relies  on 
past  experience  and/or  deduction  (as  opposed  to  data  analysis)  to  make 
the  identification.  For  example,  if  an  individual  knows  that  a certain 
process  grows  at  an  exponential  rate  he  is  likely  to  select  an  infinite 
set  of  exponential  curves  such  as  y^  = a^.  In  this  case,  his  remaining 
task  is  to  find  the  value  of  a,  which  is  most  consistent  with  the 


process.  It  is  important  to  recognize  that  the  NDH  selection  of  a model 
can  lead  to  satisfactory  or  unsatisfactory  results.  If  the  non-data 
selection  is  valid,  one  can  save  time  and  eliminate  the  step  of  statis- 
tical identification  without  loss  of  predictive  power.  On  the  other 
hand  if  the  underlying  process  is  different  than  one  assumes  it  to  be, 
there  can  be  a serious  error  in  terms  of  the  forecasting  power  of  the 
model . 

NDH  models  of  the  second  kind  can  occur  in  a wide  variety  of  con- 
texts and  recently  have  been  studied  in  the  accounting  literature  (see 
Brown  ,1977).  In  particular  ARIMA  models  identified  using  data  for  the 
market  have  been  compared  to  firm  identified  models;  the  results  have 
tended  to  indicate  that  one  general  model  tends  to  produce  forecasts 
comparable  to  those  of  firm  identified  models.  However  it  should  be 
noted  that  identification  of  a model  with  one  set  of  data  and  estimation 
of  parameters  with  another  allows  for  the  possibility  that  in  a given 
case  a model  might  be  selected  that  is  not  consistent  with  the  under- 
lying process  of  the  series  used  for  estimation.  For  example,  assume 
that  the  structural  model  of  a particular  firm  is  different  than  the 
structural  model  for  the  market  taken  as  a whole.  In  this  case  identi- 
fication using  market  data  would  not  lead  to  the  correct  model. 

UDH  models  have  been  used  extensively.  Examples  are  step-wise 
multiple  regression  and  the  Box-Jenkins  method  as  it  has  been  used 
traditionally.  In  step-wise  regression  one  first  considers  a class  of 
models,  each  containing  a different  number  of  independent  variables  and 
proceeds  to  select  the  simplest  model  possible  without  a signficant 
loss  in  explanatory  power.  For  example  an  individual  using  multiple 


regression  might  examine  the  two  model  class  {y  = a-jX^  + b,  y = 
a-|X^  + a^x^  + b}  and  find  that  the  second  model  y = a-jX^  + a^x^  + b 
has  the  same  correlation  coefficient  as  the  simpler  model  y = a-jX-i  + b. 
He  therefore  will  select  the  simpler  model  since  the  more  complex  model 
offers  no  advantage.  The  procedure  is  similar  in  the  Box-Jenkins  case 
except  there  is  a tendency  to  make  the  identification  in  a two  stage 
process  involving  (1)  an  initial  selection  and  (2)  improvement  when 
warranted.  Specifically,  the  individual  will  first,  via  examination  of 
the  data,  select  a model  from  the  ARIMA  class  and  then  proceed  to 
replace  this  model  with  another  in  the  event  that  diagnostic  checks 
indicate  model  inadequacy. 

A primary  advantage  of  UDH  models  is  that  they  "let  the  data  speak 
for  themselves";  that  is  they  allow  for  a difference  between  the 
individual's  priors  about  what  he  thinks  the  structural  model  should 
^ and  what  the  data  indicate  it  actually  is.  In  particular  a NDH 
modeling  process  might  result  in  a model  that  is  highly  inconsistent 
with  the  history  of  the  variable  of  interest,  but  the  UDH  model  would 
not  succumb  to  this  problem. 

In  summary  it  is  concluded  that  both  the  NDH  and  UDH  approaches 
have  their  individual  advantages  and  limitations.  In  addition,  the 
usefulness  of  a particular  approach  depends  on  the  forecasting  problem 
at  hand.  In  the  context  of  the  present  study  the  focus  is  entirely  on 
UDH  models. 

Estimation . This  phase  of  SFM  development  is  the  process  wherein 
the  parameters  of  the  structural  model  are  selected.  For  example,  in 
the  case  of  the  model  y^  = <!>iy|._i  + the  decision  maker  generally 
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selects  what  he  considers  to  be  the  value  of  cj)^  that  maximizes  the 
likelihood  of  the  observed  series  given  the  structural  model  he  has 
identified  and  distributional  assumptions  such  as  normality.  This  is 
the  approach  of  classical  statistics  known  as  maximum  likelihood  esti- 
mation, which  typically  results  in  a formula  (known  as  an  estimator) 
which  when  applied  to  the  data  produces  the  structural  model  parameters. 
For  example,  in  the  case  of  Box-Jenkins  models  the  method  of  maximum 
likelihood  estimation  is  applied  to  produce  the  formula  "minimize" 

[Se  ((j),  0)]  where  the  term  being  minimized  is  the  sum  of  the  squared 

data  residuals  expressed  as  a function  of  the  parameters  {0,  0}  (both 
vectors).  A primary  reason  that  the  maximum  likelihood  estimator  is 
used  is  that  it  generally  has  the  statistical  properties  of  leading 
to  minimum  variance  and  unbiased  estimates. 

At  this  point  it  should  be  noted  that  the  usefulness  of  estimation 
is  dependent  on  the  assumption  that  a reasonable  structural  model  has 
been  selected.  This  in  turn  assumes  that  the  decision  to  select  the 
original  family  of  models  for  identification  was  valid. 

Data  Set.  Implicit  inthe  individual's  choice  of  a family  of  models 
is  a selection  of  data  to  be  used  in  the  modeling  process.  Stated 
differently  the  selection  of  a particular  family  of  models  imposes  a 
restriction  as  to  which  variables  will  be  involved  in  the  final  fore- 
casting process.  For  example  if  an  individual  chooses  the  family 
{y  = a-]X^  + b,  y = a^x-|  + a2X2  + b},  he  has  limited  his  analysis  to  a 
consideration  of  the  effects  of  x-|  and 


A Classification  of  Forecasting  Models  Used  in  Forecasting  Accounting 
Earnings 

Building  upon  the  above  discussion,  structural  models  can  be 
categorized  along  three  dimensions: 

(1)  NDH  vs.  firm  identified  model s.^^ 

(2)  Longitudinal  vs.  cross-sectional  models. 

(3)  Univariable  vs.  multivariable  models. 

NDH  vs.  Firm  Identified  Models.  As  discussed  above  NDH  models  are 
those  that  are  selected  without  utilizing  historical  data  corresponding 
to  the  particular  variable  being  forecasted,  whereas  UDH  models  do 
utilize  such  data.  In  the  case  of  forecasting  earnings,  a UDH  model 
always  will  utilize  in  both  identification  and  estimation  the  historical 
data  of  the  firm  for  which  forecasts  are  being  generated.  This  type 
of  UDH  model  is  therefore  referred  to  as  a firm  identified  model. 

Longitudinal  vs.  cross-sectional  models.  This  dichotomization 
divides  structural  models  into  two  general  classes,  longitudinal  and 
cross-sectional  models.  The  longitudinal  models  include  all  models 
that  explicitly  model  variables  as  a function  of  previous  values  of 
themselves  and/or  other  variables.  A primary  example  of  this  type  is 
the  ARIMA  family  of  models.  Cross-sectional  models  include  those  that 
do  not  explicitly  model  variables  as  functions  of  previous  values  of 
themselves  and/or  other  variables.  A primary  example  of  this  class  is 
the  multiple  regression  family  of  models  which  model  the  relationship 
between  two  or  more  variables  at  one  point  in  time. 

Note  that  the  two  classes  involve  an  entirely  different  modeling 
process  and,  in  addition,  different  fundamental  assumptions  with  respect 


to  the  variable  of  interest  are  involved.  In  the  longitudinal  case  it 
is  assumed  that  a stochastic  process  should  represent  the  variable  being 
modeled.  Another  way  of  saying  this  is  that  the  variable  of  interest  is 
assumed  to  change  over  time  according  to  some  modelable  pattern;  that 
is,  certain  time  series  properties  are  assumed  to  exist.  For  example, 
the  simple  autoregressive  model  y^  = -y^  + u^  assumes  that  the  value 

of  y in  the  period  t depends  heavily  on  the  value  of  y in  the  previous 
time  period  t - 1.  In  particular,  since  y,  in  a given  period  t,  is 
expected  to  be  the  negative  of  y^_-j  it  is  easy  to  see  that  such  a 
series  will  tend  to  reverse  in  sign  from  one  period  to  the  next.  This 
exemplifies  a modelable  pattern. 

However,  the  cross-sectional  class  hand  does  not  consider  any  time 
series  properties.  Instead  this  class  of  models  utilizes  linear 
dependencies  between  variables  that  exist  at  one  point  in  time.  An 
example  of  this  class  is  the  multiple  regression  family  of  models.  For 
instance,  the  regression  model  y = a^x^  + a^x^  + b could  be  written 
equivalently  as  y^  = a-|X^  ^ + a^x^  b.  Note  that  all  variables  have 

the  same  time  subscript  and  thus  occur  at  the  same  point  in  time. 

Univariable  vs.  multivariable  models.  This  dichotomi zation 
separates  families  of  models  into  two  classes,  namely  those  with  one 
explanatory  variable  and  those  with  more  than  one  expl a'natory  variable. 

Of  particular  importance  is  the  fact  (this  is  discussed  below  in  detail) 
that  the  family  of  ARIMA  models  are  restricted  to  one  variable. 

It  should  be  noted  that  the  restriction  to  one  variable  can  lead 
to  an  unnecessarily  restricted  model.  This  is  easily  seen  in  the  case 


of  multiple  regression  where  additional  variables  may  result  in  a better 
fitting  model.  For  example,  an  employer  interested  in  establishing  a 
statistical  relationship  between  the  sales  levels  of  his  various 
salesmen  and  certain  independent  variables  probably  would  not  be  happy 
in  considering  college  GPA  as  the  only  independent  variable.  He  might 
want  to  consider  other  variables  such  as  experience  or  the  geographic 
location  of  the  sales  district  or  personality. 

An  Evaluation  of  Statistical  Methods  Used  in  Earnings  Forecasting 
with  Respect  to  Data  Utilization 

In  addition  to  the  above,  models  can  utilize  three  distinct  data 
sets:  data  i nternal  to  the  firm  (for  convenience  earnings  are  excluded 

from  this  definition),  data  external  to  the  firm  and  earni ngs  data.  An 
example  of  internal  data  is  ratios  and  an  example  of  external  data  is 
an  industry  index.  Also,  as  mentioned  above,  theiamily  of  models  used 
by  an  individual  generally  places  restrictions  upon  the  data  that  is 
utilized  by  the  model  in  forecasting. 

Given  the  above  classification  system  we  can  characterize  previous 
research  along  five  specified  SFMs: 

(1)  Firm  identified  multivariable  cross-sectional  models 
utilizing  internal  data  in  identification  and  application.^ 

(2)  Firmidentified  multivariable  cross-sectional  models 
utilizing  external  data  in  identification  andapplication. 

(3)  NDH  univariable  longitudinal  models  using  earnings  in 
appl ication . 

(4)  Firm  identified  univariable  longitudinal  models  utilizing 
earnings  in  identification  and  application. 

(5)  Firm  identified  multivariable  longitudinal  models 
utilizing  earnings  in  identification  and  application. 
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SFMs  (1 ) and  (2) . SFMs  (1)  and  (2)  include  the  above  mentioned 
business  failure  studies,  the  Gonedes  (1973)  study  on  the  predictive 
value  of  market  information  in  relation  to  earnings,  and  the  O'Connor 
(1973)  study  on  the  predictive  value  of  ratios  in  relation  to  market 
return . 

These  studies  have  in  general  shown  that  both  internal  and  external 
information  have  predictive  value.  Gonedes  (1973)  found  that  market 
information  is  useful  in  predicting  EPS,  and  the  business  failure 
studies  have  shown  that  ratio  information  is  useful  in  predicting 
earnings.  A notable  exception,  however,  is  the  O'Connor  (1973)  study 
which  found  no  predictive  value  in  ratio  information  in  predicting  high 
vs.  low  market  return.  Little  is  known  about  the  association  between 
ratios  and  accounting  dollar  earnings. 

SFMs  (1)  and  (2),  however,  are  cross-sectional  and  by  definition 
do  not  exploit  the  time  series  properties  of  the  data.  Lagged  cross- 
correlational  dependencies  are  ignored.  For  example,  the  family  of 
multiple  regression  models  only  explicitly  model  events  occurring  at 
one  point  in  time  and  cannot  take  into  consideration  relationships  in 
a variable  at  more  than  one  point  in  time.  To  illustrate  this  point 
assume  that  for  a given  firm  that  the  expected  earnings  of  the  fourth 
quarter  are  always  double  the  earnings  of  the  first  quarter.  In  this 
case  a cross-sectional  model  would  ignore  this  information. 

SFM  (3).  SFM  (3)  includes  a large  number  of  studies  relating  to 
the  general  subject  of  security  price  research.  However,  Collins  (1976) 
and  Gonedes  and  Dopuch  (1974)  have  argued  that  selecting  this  type  of 
model  on  a NDH  basis  can  lead  to  possible  suboptimal  results.^  The 


reason  for  this,  as  discussed  in  detail  above,  is  that  NDH  models  do 
not  utilize  all  of  the  data  at  hand.  In  particular  the  NDH  approach 
when  applied  to  earnings  forecasting  for  an  individual  firm  results 
in  ignoring  the  previous  history  of  that  firm's  earnings.  Ignoring 
such  data  allows  for  the  possibility  of  a misspecified  model. 

SFM  (4).  In  recent  years  the  Box-Jenkins  methods  for  identifica- 
tion and  estimation  of  univariable  stochastic  time  series  model  has  been 
introduced  in  the  accounting  literature  (Mabert  and  Radcl iff e,  1974) . 

One  important  aspect  of  this  method  is  that  it  encompasses  a very  broad 
family  of  structural  models  (i.e.,  the  ARIMA  models)  that  describe  a 
one  variable  time  series.  In  addition,  the  method  contains  an  identifi- 
cation process  that  generally  selects  the  best  model  from  the  ARIMA 
class.  The  net  result  is  the  selection  of  a model  that  is  parsimonious 
in  that  it  contains  a small  number  of  parameters. 

Research  has  demonstrated  that  forecasts  from  Box-Jenkins  models 
are  more  accurate  than  other  types  of  forecasts.  For  example,  Foster 
(1977)  found  that  they  compared  favorably  with  several  NDH  models  and 
Lorek  et  al . (1976)  recently  applied  this  paradigm  to  accounting  earn- 
ings series  and  found  that  an  ARIMA  model ^ generally  outperformed 
management  in  published  earnings  forecasts. 

With  respect  to  data  utilization,  SFM  (4),  unlike  SFMs  (1)  and 
(2),  does  utilize  time  series  properties  in  its  identification  and 
application.  However,  the  data  set  used  contains  only  one  variable 
and  thus  precludes  the  exploitation  of  the  i nternal -external  data  sets. 

Of  particular  importance  is  the  Box-Jenkins  method  which  encompasses 
the  large  family  of  ARIMA  models.  Noteable  however  is  that  all  of  these 
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models  are  univariate  and  have  no  capability  of  considering  events  that 
occur  in  the  external  and  internal  data  sets.  For  example,  if  industry 
performance  was  a good  leading  indicator  of  the  performance  for  a 
particular  firm  of  interest,  the  use  of  the  univariate  Box-Jenkins 
method  in  itself  would  preclude  any  statistical  consideration  of  such 
information.  Ignoring  such  information  could,  theoretically,  lead  to 
a very  poor  forecast  using  SFM  (4).  This  could  be  avoided  using  a less 
restrictive  SFM.  For  example,  using  the  above  hypothetical  firm  it 
might  be  the  case  that  a severe  drop  in  industry  performance  would  go 
unnoticed  using  the  Box-Jenkins  method.  This  can  be  seen  by  observing 
Table  1 below. 

Table  1 

Demonstration  Of  The  Limitations  Of  ARIMA  Models 


Time 

(t) 

t-4 

t-3 

t-2 

t-1 

t 

t-1 

EPS  for 
hypothetical 
fi  rm 

.42 

.43 

.41 

.44 

.45 

-.20 

industry 

performance 

index 

94 

92 

10 

97 

98 

96 

Assume  that  EPS  for  time  t+1  is  being  forecasted  from  time  t.  Note 
that  a forecast  model  that  depended  upon  the  last  few  values  of  the  EPS 
series  would  forecast  a value  of  t+1  somewhere  in  the  neighborhood  of 
.40  to  .45.  Further,  note  that  the  deviation  between  the  actual  EPS 
(-.20)  and  the  forecasted  EPS  would  be  very  large. 


This  problem  could  have  been  mitigated  by  using  information  in  the 
industry  index.  To  demonstrate  this  assume  that  there  is  a three 
quarter  delay  between  shocks  in  the  industry  index  and  reactions  in 
EPS.  Using  a less  restrictive  paradigm  we  could  have  anticipated  a 
sharp  drop  in  EPS  at  time  t-2  when  the  industry  index  suddenly  dropped 
to  10. 


The  above  example  is  rather  dramatic,  but  limited,  in  that  there 
are  other  types  of  events  in  the  internal -external  data  sets  that  might 
prove  useful  in  forecasting  such  as  trends,  cycles,  etc. 

SEM  (5).  SFM  (5)  has  not  been  generally  used  but  has  recently  been 
suggested  by  Foster  (1977).  This  SFM,  which  generalizes  the  traditional 
Box-Jenkins  approach,  avoids  the  above  limitations.  In  particular, 

SFMs  (1),  (2)  and  (4)  are  generalized  by  allowing  for  the  modeling  of 
the  time  series  properties  of  the  internal,  external  and  earnings  data 
sets.  The  general  form  of  SFM  (5)  is  (1)  y^  = [f](yt_].  Yt-2  •••)» 
f2(xt^^^  xj]l  ...),  x[?j,  ...),  ...,  f^(x^.  x{.|^{,  ...)  + 

U(t)].  Note  that  (1)  completely  generalizes  SFMs  (1),  (2)  and  (4)  to 
remove  the  above  discussed  limitations.  In  particular,  f2,  f3  ...,  f^ 
generalize  SFM  (4)  by  allowing  y^  to  be  modeled  as  a function  of  x^^^, 
x^^^,  ...,  Also  SFMs  (1)  and  (2)  are  generalized  by  f -| , f2,  ..., 

f^  by  the  inclusion  of  y^_^,  ...  in  fp  x[]},  x[l^,  ...  in  f2,  etc. 

In  addition,  note  that  (1)  will  reduce  to  SFMs  (1),  (2)  or  (4)  as 
special  cases.  SFMs  (1)  and  (2)  can  be  specified  by  eliminating  f-|  and 
cases  with  subscripts  less  than  t in  f2,  f3  ...,  f^.  The  net  result  is 
a very  broad  family  of  models  which  contain  the  union  of  all  models  in 
SFMs  (1),  (2)  and  (4)  as  a proper  subset.  In  summary  SFM  (5),  due  to 


its  generality,  has  the  ability  to  utilize  more  data  than  the  other 
SFMs ■ Specifically,  it  can  utilize  the  time  series  and  cross- 
sectional  properties  of  the  internal,  external  and  earnings  data  sets. 

Statement  of  the  Problem 

It  has  been  shown  that  there  is  a need  for  earnings  forecasts  for 
use  in  various  decision  models  and  research  studies.  This  implies  that 
the  use  of  suboptimal  forecasts  can  lead  to  dysfunctional  decision 
making  or  undesirable  inferences  in  research.  Gonedes  (1973,  p.  212- 
213)  wrote,  "If  in  fact  a statistical  model  used  in  a given  study 
suffers  from  important  misspecifications  or  if  the  model  is  not  an 
optimal  (oratleast  near  optimal)  model  for  the  time  series  of  interest, 
the  use  of  the  model  is  likely  to  induce  tenous  inferences." 

Notable,  however,  is  that  the  specification  of  an  SFM  depends  on 
the  data  set  used  in  its  estimation  and  application;  and  in  order  to 
select  an  optimal  SFM  one  must  know  which  data  set  to  use.  In  an 
economic  sense  the  rational  forecastor  will  use  a "cost-benefit" 
criterion  in  the  sel ection  from  among  alternative  data  sets.  This 
implies  that  he  needs  to  know  the  value  of  his  alternative  data  sets. 
Stated  differently,  it  is  impossible  to  build  a forecast  model  without 
some  specification  as  to  which  variables  will  be  included  in  that  model; 
and  the  inclusion  of  a particular  group  of  variables  has  implications 
for  model  complexity  and  data  gathering.  In  particular,  the  more 
variables  specified,  the  more  complex  the  model  becomes  and  the  more 
data  that  needs  to  be  supplied.  Since  data  gathering  and  model  com- 
plexity can  involve  considerable  costs,  there  is  a need  to  know  the 
value  of  data  before  building  a forecast  model. 


The  valuation  of  alternative  data  sets  has  to  be  done  in  liqht  of 
their  contribution  to  optimizing  the  forecast  model.  Therefore  one  must 
have  some  objective  in  forecasting.  Such  an  objective  has  been  given 
considerable  attention  and  the  National  Association  of  Accountants, 
American  Institute  of  Certified  Public  Accountants  and  SEC  among  other 
groups  have  determined  that  accuracy  is  of  primary  concern  (Lorek,  et 
al,  1976  p.  321). 

Previous  Research 

There  has  been  little  research  involving  the  usefulness  of 
alternative  data  sets  in  forecasting  earnings  dollars.  One  study  to 
date  is  Gonedes  (1973),  which  compared  earnings  forecasts  based  on 
the  application  of  several  naive  models  utilizing  earnings  information 
in  application  only  [SFM  (3)].  It  was  concluded  that  a "market  index" 
model  [SFM  (2)]  generally  predicted  as  well  or  better  than  the  naive 
models  [SFM  (3)]. 

An  admitted  limitation  of  the  Gonedes  study  was  that  the  results 
were  dependent  on  the  SFMs  used;  however,  the  study  did  show  that  the 
external  data  set  has  information  content;  however,  at  the  same  time 
it  does  not  show  its  relative  value  as  compared  to  other  possible 
data  sets. 

Statement  of  the  Purpose 

The  purpose  of  this  study  is  to  develop  and  apply  SFM  (5)  to  the 
problem  of  valuing  information  content  (in  terms  of  earnings  forecasts) 
in  the  internal,  external  and  earnings  data  sets.  Such  an  SFM  avoids 
to  a high  degree  the  limitations  of  the  SFMs  that  have  been  used  in 
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previous  research  (as  discussed  below).  To  do  this,  a firm  identified 
multivariable  longitudinal  model  utilizing  simultaneously  the  internal, 
external  and  earnings  data  sets  is  employed  (SFM  (5)).  Specifically, 
all  of  these  data  sets  are  analyzed  and  partitioned  into  orthogonal 
components  which  are  in  turn  evaluated  in  terms  of  their  individual  use- 
fulness in  forecasting.  These  data  sets  are  used  to  construct  indices 
(by  factor  analysis)  which  are  individually  added  to  SFM  (4).  Next, 
these  indices  are  evaluated  in  terms  of  their  ability  to  reduce  fore- 
cast error. 

Primary  Expected  Contribution 

The  results  of  the  study  provides  information  on  the  usefulness 
of  internal,  external  and  earnings  data  in  forecasting  earnings.  In 
particular,  the  study  provides  information  on  the  usefulness 
of  adding  internal  and  external  data  to  the  Box-Jenkins  analysis.  As 
mentioned  above,  usefulness  has  been  operational ly  defined  to  be 
predictive  ability  (Beaver,  Kennel ly  and  Voss,  1968)  and  has  been  cited 
as  the  primary  objective  of  accounting  data  (Report  of  the  Study  Group 
of  Objectives  on  Financial  Statements,  1973;  American  Accounting 
Association,  1966).  Also,  as  mentioned  above,  earnings  forecasting 
is  a topic  which  is  currently  of  prime  importance  to  the  FASB  and  SEC; 
and  in  particular,  if  in  the  future  formal  guidelines  are  established 
with  respect  to  acceptable  forecast  models,  there  will  be  a need  to 
know  the  value  of  generalized  ARIMA  models  which  incorporate  internal 
and  external  data. 


In  addition  to  the  above,  this  study  provides  an  improvement  in 
the  existing  methodology  of  time  series  analysis  in  accounting  research 
and  practice.  In  particular,  a multivaraite  generalization  on  the 
univariate  ARIMA  approach  is  introduced.  This  generalization  will, 
to  a large  degree,  avoid  the  limitations  (as  discussed  above)  of  the 
Box- Jenkins  and  multiple  regression  approaches. 


Notes 


hn  this  study  the  term  "transfer  function  models"  is  distinguished 
from  the  term  "Box-Jenkins  Model."  While  both  are  discussed  by  Box- 
Jenkins  (1970),  the  latter  refers  to  univariate  autoregressive 
integrated  moving  average  models  and  the  former  refers  to  a bivariate 
generalization  of  these  models. 

p 

^Information  is  assumed  to  be  data  with  predictive  value. 

^The  ARIMA  family  of  models  has  recently  received  considerable  attention 
in  the  literature.  Three  reasons  for  this  are:  (1)  These  models 

provide  forecasts  that  are  accurate  when  compared  to  those  of  other 
models  (Foster,  1977).  (2)  They  require  data  gathering  only  for 

the  variable  being  forecasted.  (3)  They  generally  require  only  a 
few  (i.e.,  typically  less  than  five) parameters  and  therefore  avoid 
the  need  for  complex  models. 

'^Recall  that  identification  is  the  process  of  structural  model 
selection. 

r 

^Application  in  this  context  means  parameter  estimation  and/or 
forecasti ng. 

^Recent  evidence  suggests  that  NDH  models  of  the  second  kind  perform 
well  when  compared  to  firm  identified  models.  (See  Brown  and 
Rozeff:  1977). 

^This  refers  to  a model  identified  and  estimated  by  using  the  Box- 
Jenkins  method. 


CHAPTER  II 


HYPOTHESES  AND  RESEARCH  METHOD 

General  Hypotheses 

At  present  SFM  (4)  has  been  receiving  a large  amount  of  attention 
in  the  accounting  literature  with  respect  to  the  forecasting  of  earnings. 
(The  main  reason  for  this  is  that  studies  have  shown  that  forecasts 
generated  with  ARIMA  models  are  comparable  in  accuracy  to  other  types 
of  forecasts  such  as  those  published  by  management  (Lorek  et  al.,  1976) 
and  certain  NDH  models  (Foster,  1977).)  As  pointed  out,  however,  this 
SFM  ignores  potential  information  in  the  internal-external  data  series. 
Because  of  this  omission  it  is  hypothesized  that  ARIMA  forecasts  will 
be  improved  by  adding  additional  data  from  the  i nternal -external  data 
sets.  This  leads  to  the  following  null  and  alternative  hypotheses. 


Hypothesis 


Ho:  The  average  absolute  percent  forecast  error  for  the 

estimated  multivariable  longitudinal  model  utilizing 
earnings,  internal  and  external  data  is  equal  to  the 
average  absolute  percent  forecast  error  for  the  esti- 
mated univariate  longitudinal  model  utilizing  earnings 
data  alone. 

Ha:  The  average  absolute  percent  forecast  error  for  the 

estimated  multivariable  longitudinal  model  utilizing 
earnings,  internal  and  external  data  is  less  than  the 
average  absolute  percent  forecast  error  for  the  esti- 
mated univariable  longitudinal  utilizing  earnings  alone. 
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Absolute  percentage  error  was  selected  because  it  is  a measure 
that  establishes  a comparable  relationship  between  firms  that  have 
earnings  which  are  different  in  absolute  scale.  For  example,  a fifty 
cents  per  share  forecast  error  for  a firm  that  reports  earnings  of 
twenty-five  cents  per  share  may  be  quite  different  in  significance  than 
a fifty  cents  per  share  error  for  a firm  that  reports  earnings  of  five 
dollars  per  share.  In  the  former  case  the  error  is  200  percent  and  the 
latter  case  the  error  is  only  10  percent. 

The  basic  thrust  of  Ha  is  to  assert  that  SFM  (5)  will  produce  more 
accurate  forecasts  than  SFM  (4).  Since  the  basic  difference  between  the 
two  SFMs  is  that  SFM  (5)  models  more  potential  information,  namely 
internal  and  external  data,  as  opposed  to  earnings  alone,  the  test 
indicates  whether  or  not  the  internal  and  external  data  sets  contain 
any  information  as  determined  by  a lower  forecast  error.  Specifically, 
rejection  of  the  null  hypothesis  would  imply  that  the  internal -external 
data  set  has  information  content  in  that  it  can  result  in  improved 
forecasts  when  added  to  SFM  (4).  It  also  would  imply  that  forecasts 
based  on  earnings  alone  (i.e.  SFM  (4))  may  not  be  optimal  since  they 
can  be  improved  by  adding  more  data  to  the  forecast  model.  The  ques- 
tion of  optimality,  however,  must  be  decided  on  whether  or  not  the 
improvement  justifies  the  additional  costs  that  might  be  associated 
with  the  acquisition  and  usage  of  the  additional  data. 

Research  Method 
Sample 

A sample  of  30  airline  firms  was  selected.  This  industry  was 
chosen  because  the  Civil  Aeronautics  Board  requires  all  certified  air 
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carriers  to  file  quarterly  income  statements  and  balance  sheets.  The 
fact  that  balance  sheets  are  reported  is  important  because  the  statis- 
tical analysis  requires  the  availability  of  data  internal  for  a firm 
for  a number  of  accounting  periods. 

The  basic  requirement  for  a firm  to  be  selected  was  the  avail- 
ability of  income  statement  and  balance  sheets  for  60  quarters.  This 
provided  50  quarters  recommended  for  model  estimation  and  10  quarters 
for  forecast  error  computation.  Since  all  airlines  in  the  industry 
were  not  included,  the  sample  selection  process  was  not  random.^ 


Operational  Definitions 


(a)  "Internal  data":  This  data  set  was  constructed  from  ratios 

and  aggregate  data.  The  ratios  used  were  based  on  those 
compiled  by  Horrigan  (1965)  from  a large  number  of  sources 
dealing  with  financial  statement  analysis  in  various  con- 
texts. The  ratios  were  compiled  to  be  representative  of 
those  used  in  practice.  The  following  is  a list  of  these 
ratios : 

Short  Term  Liquidity  Ratios 

(a)  Current  asset  to  current  debt 

(b)  Notes  receivable  plus  cash  to  current  liabilities 

Long  Term  Solvency  Ratios 

(a)  Net  worth  to  total  debt 

(b)  Net  worth  to  long-term  debt 

(c)  Net  worth  to  operating  plus  non-operational 
property 

Capital  Turnover  Ratios 

(a)  Total  operating  revenues  to  notes  receivable 

(b)  Total  operating  revenue  to  working  capital 

(c)  Total  operating  revenue  to  operating  plus  non- 
operating property 

(d)  Total  operating  revenue  to  net  worth 

(e)  Total  operating  revenue  to  total  assets 

Profit  Margins  Ratios 

(a)  Net  operating  profit  to  total  operating 
revenues 

(b)  Net  income  to  sales 


Return  on  Investment  Ratios 

(a)  Net  operating  profits  to  total  assets 

(b)  Net  profits  to  net  worth 

In  addition  to  the  above  "internal  data"  also  included  working 
capital,  operating  income  and  total  operating  expenses. 

(b)  "External  data":  This  data  set  included  the  Dow  Jones 

Industrial  Index,  Standard  and  Poors  Industry  Index^ 
and  the  firms'  past  stock  price  series. ^ These  were 
chosen  to  be  representative  of  industry  and  market  factors. 

(c)  "Earnings  data":  This  data  set  will  include  the  primary 

earnings  per  share  on  common  before  extraordinary  items 
adjusted  for  stock  splits  and  stock  dividends.^ 

Construction  and  Application  of  the  Forecast  Models 
Rationale  and  Summary  of  the  Methodology 

The  null  hypothesis  was  tested  by  comparing  the  forecast  errors  of 
a special  case  of  SFM  (5)  with  SFM  (4).  In  particular,  this  method  of 
testing  was  employed  because  only  a restricted  form  of  SFM  (5)  has  been 
developed  for  general  use.  This  form  is  called  the  linear  transfer 
function  (which  is  discussed  in  detail  in  Appendix  4)  and  is  represented 
mathematically  by  the  equation:  (2)  y^  = f (y^_-| , • • ■), 

f(x^,  . . .),  u [t]  which  is  a special  case  of  the  general 

form  of  SFM  (5)  discussed  in  the  previous  chapter. 

In  (2)  the  predicted  variable  y^  is  expressed  as  a function  of  the 
two  variables  in  brackets  on  the  right  side  plus  u(t)  which  is  an  error 
term.  This  differs  from  the  traditional  Box-Jenkins  analysis  in  that 
an  additional  variable  x^  has  been  added. 

The  result  is  that  equation  (2)  generalizes  the  ARIMA  class  of 
models  to  two  variables  allowing  a data  series  containing  potential 
information  to  be  added,  thus  allowing  a test  of  the  null  hypothesis. 


Theoretically  this  allows  the  null  hypothesis  to  be  tested  by  comparing 
the  univariate  forecast  errors  to  those  of  transfer  functions  con- 
structed by  adding  each  of  the  variables  in  the  interal-external  data 
sets,  one  by  one  to  the  corresponding  ARIMA  models.  For  example,  for 
the  first  sample  firm  we  could  compare  the  ARIMA  forecasts  to  transfer 
function  forecasts  with  the  first  ratio  added.  We  would  continue  to 
do  this  for  all  items  in  the  internal-external  data  sets.  Finally  we 
would  repeat  the  procedure  for  the  remaining  29  firms.  Such  an  analysis 
would  work  well  except  for  three  problems:  (1)  The  individual  compari- 

sons would  not  be  independent  of  each  other  and  therefore  i nterpretation 
of  the  results  would  be  confounded.  For  example,  from  a statistical 
standpoint  it  would  be  inappropriate  to  use  the  Student's  T test  which 
requires  that  individual  tests  be  independent  of  each  other.  In  addi- 
tion we  would  expect  a large  number  of  the  tests  to  be  significant  due 
strictly  to  chance.  For  example,  for  600  tests  with  each  test  having 
an  a level  of  .1  we  would  expect  rejections  for  60  (i.e.,  600  x .1)  of 
these  tests  due  to  chance  alone.  For  example,  if  we  ignored  this  prob- 
lem of  independence  (which  would  be  highly  improper)  and  found  signifi- 
cance in  80  of  the  tests  we  would  not  want  to  point  to  any  particular 
test  and  state  that  for  that  test  a true  difference  in  the  population 
exists.  This  is  because  there  is  no  way  to  determine  whether  rejection 
occurred  by  chance  (a  error)  or  was  correct.  These  problems,  dependence 
between  comparisons  (statistical  independence)  and  combined  a errors, 
will  be  dealt  with  in  detail  below  by  using  a MANOVA  design.  (2)  In 
addition  to  the  above,  if  one  were  to  develop  an  experimental  design  to 
overcome  the  problems  of  statistical  independence  and  combined  a error, 


interpretation  problems  of  multicol 1 ineari ty  among  predictor  variables 
would  still  remain.  For  instance,  assume  that  we  were  able  to  state 
that  in  general  the  addition  of  the  ratio  A reduces  forecast  error  by 
20  percent  and  the  addition  of  ratio  B reduces  forecast  error  by  25 
percent.  Also  assume  that  the  two  ratios  are  col  linear  and  that  we 
are  interested  in  considering  the  improvement  in  forecasts  with  both 
ratio  A and  ratio  B added  at  the  same  time.  Note  that,  due  to  the 
col  1 i neari ty  of  the  two  ratios,  we  could  not  say  that  the  combined 
improvement  would  be  the  sum  of  the  two  individual  improvements.  The 
problem  would  become  even  more  complex  with  three  or  more  ratios  being 
added.  Therefore  such  an  approach  would  not  give  a measure  of  the  value 
of  a data  set  composed  of  a group  of  variables  nor  would  it  enable  one 
to  compare  the  predictive  value  of  one  group  of  variables  to  another. 

In  particular,  since  the  internal -external  data  set  is  a group  of 
variables  (in  the  event  that  it  has  predictive  value)  it  would  not  be 
possible  to  measure  the  predictive  value.  (3)  The  analysis  would 
require  600  transfer  functions  and  30  ARIMA  models,  and  due  to  these 
large  numbers  the  study  would  not  be  feasible  due  to  the  difficulty  of 
processing  such  a large  number  of  models. 

Due  to  the  problem  of  mul ticol 1 ineari ty  between  predictor  variables 
factor  analysis  with  orthogonal  rotation  was  used.  A secondary  problem 
that  led  to  use  of  factor  analysis  was  the  above  mentioned  one  of  a very 
large  number  of  models.  Factor  analysis  is  employed  because  it  has  two 
useful  properties  that  overcame  both  problems.  In  particular,  factor 
analysis  with  orthogonal  rotation  is  a statistical  method  which  reduces 
a group  of  variables  into  a smaller  number  of  variables  which  are  not 


multicol 1 inear  while  at  the  same  time  retaining  substantially  all 
information  (statistical  variation)  in  the  original  group  of  variables 

(Morrison,  1967).  The  net  result  is  a small  number  of  non-mul ticol 1 inear 
variables. 

The  following  is  a four  step  summary  of  the  method  used  to  test 
the  null  hypothesis: 

L‘  The  internal  and  external  data  sets  are  pooled  and  jointly 

factor  analyzed  (using  orthogonal  rotation)  into  uncor- 
related components.  The  resultant  factor  loadings  are 
used  to  construct  indices  (for  a similar  use  of  factor 
analysis  see  King,  1966). 

^Tep  2:  (A)  The  above  indices  are  used  as  input  series  into  linear 

transfer  function  models.  ^ The  transfer  function  models 
are  identified  and  estimated  for  each  firm  and  input 
series  using  approximately  50  quarters  of  data  (the  most 
recent  10  quarters  are  excluded  from  this  step  for  purposes 
of  testing  the  null  hypothesis). 

(B)  The  transfer  functions  are  applied  to  forecasting  the 
10  quarters  omitted  in  (A)  above. 

S_tep_3:  ARIMA  models  for  adjusted  EPS  are  constructed  and  applied 

to  forecasting  in  Step  2. 

Step  4:  For  each  of  the  10  forecasts  mean  absolute  percentage  forecast 

errors  are  computed  by  factor  index  as  well  as  for  the  ARIMA 
forecasts.  Final ly,  means  for  the  ARIMA  absolute  percentage 
errors  are  compared  to  the  same  transfer  function  means. 


Detailed  Discussion  of  the  Method 


Step  1 (factor  analysis):  The  internal  and  external  data  sets 

are  pooled  into  a common  data  set  and  then  factored  firm 
by  firm  cross-sectional ly  into  distinct  uncorrelated 
components.^  The  resultant  factor  loadings  are  then  applied 
to  the  data  to  produce  firm  by  firm  indices. 

Each  firm  behaved  approximately  the  same  under  the  factor  analysis 
typically,  four  factors  explained  approximately  80  percent  of  the 
variation  in  the  data  relating  to  each  firm.  In  addition,  a fifth 
factor,  if  computed,  typically  explained  a very  small  proportion  of 
the  total  variation  (i.e.,  approximately  5 percent).  For  this  reason 
four  factors  were  retained  for  each  firm.  Table  2 presents  a firm  by 
firm  analysis  of  the  percent  variation  explained  for  each  of  the  four 
factors.  For  example,  the  first  factor  for  firm  1 explained  24.7 
percent  of  the  variation  in  firm  1,  factor  2 explained  19.3  percent 
of  the  variation  in  firm  1 and  the  four  factors  combined  explained  a 
total  of  72.1  percent  of  the  variation  of  the  firm  1 data  set. 

These  percentages  can  be  interpreted  in  terms  of  the  relative 
importance  of  their  associated  factors  with  respect  to  information  con- 
tent. It  should  be  noted,  however,  that  this  information  content  is 
determined  with  respect  to  predicting  the  values  of  the  variables  in 
the  original  unfactored  data  set  and  not  earnings  per  share.  This  can 
be  stated  in  statistical  terms  by  saying  that  regression  models  built 
using  the  factors  and  the  original  data  set  would  result  in  R squared 
values  of  approximately  .8  or  correlations  of  approximately  .9.  In 
addition,  it  is  possible  to  interpret  the  factors  conceptually  in  terms 
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Percent  Variation  Explained  For  Each  Firm  By  Factor 


Firm  No. 


Factor  1 Factor  2 Factor  3 Factor  4 Firm  Total 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 


24.7 

33.2 

28.3 

35.5 

45.6 

72.3 

54.0 

45.8 

31.0 

30.9 

33.9 

60.1 

32.0 

34.7 
37.2 

34.6 

35.8 

32.7 

42.8 
36.7 

32.5 

31.1 

42.5 

30.5 

41.5 

33.2 

36.6 
38.5 
47.0 

32.9 


19.3 

14.9 

21.5 

14.7 

27.5 

13.6 

25.7 

16.2 

20.3 

7.9 

9.9 

6.8 

14.2 

10.6 

20.6 

11.3 

22.0 

18.7 

26.4 

15.7 

19.0 

15.6 

14.2 

9.8 

18.3 

14.7 

22.8 

18.2 

18.2 

11.0 

25.2 

12.6 

24.7 

17.5 

20.7 

13.7 

18.0 

12.1 

26.8 

17.3 

30.1 

10.2 

19.2 

10.4 

26.2 

11.9 

19.5 

15.4 

20.6 

13.6 

27.6 

14.0 

24.9 

15.8 

19.5 

15.0 

18.3 

17.0 

18.8 

17.6 

13.2 

72.1 

8.4 

77.8 

8.3 

77.7 

6.8 

84.2 

6.6 

80.4 

4.3 

93.4 

7.6 

86.4 

6.4 

84.2 

11.2 

82.8 

9.5 

82.4 

11.9 

80.4 

4.7 

88.7 

11.3 

76.3 

8.7 

84.4 

8.7 

75.2 

11.7 

84.0 

5.2 

82.6 

10.3 

77.4 

11.3 

84.3 

5.3 

86.1 

7.1 

79.9 

10.3 

70.9 

5.3 

85.9 

10.0 

75.5 

7.5 

83.3 

10.9  ’ 

85.7 

6.8 

84.0 

8.5 

81.5 

5.2 

87.4 

11.5 

80.8 

of  the  original  variables.  Such  an  analysis,  however,  will  not  be 
undertaken  in  this  research  since  the  research  design  was  oriented 
towards  the  specific  goals  of  data  reduction  and  orthogonal i zation 
rather  than  interpretation.  (For  a detailed  interpretive  factor 
analysis  see  Pinces  et  al.,  1975). 

Steps  2 and  3 (modeling):  For  each  firm  four  transfer  and  one  uni- 

variate ARIMA  models  are  constructed.  Each  of  the  four 
transfer  models  correspond  to  one  of  the  four  factor  indices 
constructed  above.  The  total  number  of  models  constructed 
is  therefore  120  (30  firms  with  four  models  each)  for  the 
transfer  case  and  30  (one  for  each  firm)  for  the  univariate 
case  resulting  in  a total  of  150  models.  Next,  10  forecasts 
for  each  model  are  then  generated  producing  1,500  forecasts. 

An  example  of  this  procedure  is  given  in  the  case  of  firm  1.  In 
this  case  four  transfer  functions  and  one  ARIflA  model  are  estimated. 

In  each  case  a base  period  of  50  quarters  is  used  which  includes  quar- 
ters beginning  at  the  first  quarter  of  1962  and  ending  with  the  fourth 
quarter  of  1974  (see  Appendix  6 for  a complete  listing  of  base  periods 
and  forecast  horizon).  The  next  step  is  to  compute  forecasts  for  10 
quarters  ahead.  This  results  in  10  forecasts  for  each  transfer  function 
and  10  forecasts  for  the  ARIMA  model,  giving  a total  of  50  forecasts 
for  firm  1 . 

Step  4 (Test  of  the  null  hypothesis):  A test  of  null  hypothesis  was 

based  on  a MANOVA  (multivariate  analysis  of  variance)  design 
which  is  a simple  generalization  of  ANOVA  (analysis  of 
variance).  The  difference  between  the  two  methods  is  that 


ANOVA  tests  for  differences  between  means  for  a single 
variable  whereas  MANOVA  tests  for  differences  between  means 
for  a group  (i.e.,  vector)  of  variables.  In  particular 
ANOVA  tests  the  hypothesis: 


and  f-IANOVA  tests  the  hypothesis: 
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where  U^.  represents  the  mean  of  variable  j in  group  i. 

Note  that  if  p equals  one  then  the  MANOVA  hypothesis  reduces 
to  the  ANOVA  hypothesis. 

The  assumptions  of  MANOVA  are  simple  multivariate  generalizations 
of  the  ANOVA  assumptions  of  normality  and  homogeneity  of  variances.  The 
first  assumption  (multivariate  normality)  generally  is  considered  satis- 
fied for  reasonably  large  samples  by  the  multivariate  central  limit 
theorem  (Morrison,  1967,  pp.  80-81).  The  second  assumption  (multivariate 
homogeneity  of  variances)  has  been  found  to  be  robust  in  the  cases 
studied  (Harris,  1975,  p.  85;  Morrison,  1967,  p.  152). 

MANOVA  can  be  applied  to  the  problem  at  hand  by  letting  the  mean 

O 

absolute  percentage  forecast  error  associated  with  each  of  the  four 
factor  indices  be  symbolized  as  forctype^.  (i  - 1,2, 3, 4;  forctype  is 
defined  as  forecast  type).  In  addition,  the  mean  absolute  percentage 
errors  associated  with  the  univariate  series  is  symbolized  as  forctype^. 


finally,  the  means  are  broken  down  according  to  the  number  of  steps 
ahead  (i.e.,  periods  in  the  future)  and  syribolized  as  stepj  (j  = 1,  2, 

■ . 10).  To  summarize  represents  the  mean  absolute  percentage 

error  for  forctype^-  and  stepj.  This  design  is  graphically  presented  in 
Table  3. 

In  terms  of  the  MANOVA  design  the  null  hypothesis  can  be  stated 
as: 
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Note  that  the  null  hypothesis  states  that  there  is  no  difference 
between  any  of  the  five  rows  in  Table  3.  Specifically,  row  5 represents 
the  ARIMA  forecasts  and  the  first  four  rows  represent  transfer  function 
forecasts.  In  addition  the  MANOVA  test  is  very  powerful  (Cooley  and 
Lohnes,  1971,  p.  228)  and  can  be  expected  to  lead  to  rejection  of  the 
null  hypothesis  in  the  event  that  any  two  rows  differ.  In  particular 
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Table  3 

MANOVA  Design  For  Mean  Absolute 
Percentages  Forecast  Errors 


STEP 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

F 

0 

R 

E 

C 

A 

S 

T 

T 

Y 

P 

E 

1 

^1,2 

^1,3 

“1,4 

^1,5 

“l,6 

“l,7 

Ul,8 

“l,9 

^1,10 

2 

^2,1 

^2,2 

^2,3 

^^2,4 

^2,5 

^2,6 

^2,7 

^2,8 

*^2,9 

^2,10 

3 

^3,1 

^3,2 

^3,3 

^3,4 

*^3,5 

^3,6 

^3,7 

^3,8 

^3,9 

^3,10 

4 

^4,1 

*^4,2 

^4,3 

*^4,4 

^4,5 

^4,6 

^4,7 

^4,8 

^4,9 

^4,10 

5 

“5,1 

^5,2 

“5,4 

*^5,5 

^5,6 

“5,7 

^5,8 

^5,9 

^5,10 

Note  that  there  are  50  cells  with  30  observations  in  each  cell 
accounting  for  the  1500  forecasts.^ 


if  one  of  the  first  four  rows  (transfer  functions)  differs  from  the 
fifth  row  (ARIMA  models)  we  would  expect  a rejection  of  the  null 
hypothesis. 

Given  the  above  it  is  now  possible  to  discuss  reasons  for  the 
selection  of  the  MANOVA  design.  First  recall  from  the  above  discussion 
on  the  rationale  for  the  method  that  the  two  potential  problems  of  com- 
bined a error  and  statistical  independence  exist  when  making  statistical 
comparisons.  This  can  be  seen  in  terms  of  the  above  design  by  noting 
that  one  could  construct  10  F tests  which  tested  for  differences  between 
rows  in  each  of  the  ten  columns  corresponding  to  the  10  steps  ahead. 

One  could  also  generate  {^^)  or  1,225  different  t tests  by  taking  all 
possible  combinations  of  cells.  Of  importance  is  the  fact  that  such 
tests  would  not  be  independent  nor  would  they  guard  against  combined 
alpha  errors,  and  therefore  they  would  be  highly  inappropriate.  For 
this  reason  a MANOVA  design  was  chosen  because  it  is  not  subject  to 
these  two  problems.  (For  a more  complete  discussion  see  Morrison, 

1967,  p.  126-127  and  Finn,  1974,  p.  320.)  In  addition,  a second  reason 
is,  as  cited  above,  that  MANOVA  is  very  powerful  and  therefore  is  very 
likely  to  detect  any  true  population  differences. 

Discussion  of  the  Results  of  the  MANOVA 

The  results  for  the  MANOVA  are  presented  below.  The  dotted  lines 
separate  various  methods  of  comparable  tests  of  the  null  hypothesis. 
Wilk's  criterion  (often  called  Wilk's  lambda)  yields  an  F approximation 
of  .55  with  40  and  552  degrees  of  freedom  which  is  significant  at  the 
.9893  level.  The  results  indicate  that  rejection  of  the  null  hypothesis 
is  not  warranted  from  the  sample  data. 
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F(40,513)  = 0.55  PROB  > F = 0.9893 


Th6  irnplication  of  tho  test  is  that  the  five  types  of  forecasts 
are  the  same.  Initially  it  might  appear  surprising  that  the  fourth 
factor,  which  explained  a relatively  small  proportion  of  the  variation 
in  the  original  data,  performed  as  well  as  the  first  factor  which 
explained  a relatively  large  proportion  of  the  variation.  This  result, 
however,  can  be  understood  in  light  of  the  nature  of  the  bivariate 
transfer  function  which  models  the  earnings  and  potential  information 
as  additional  series  (the  factor  index).  If  the  additional  series  has 
no  additional  information  with  predictive  value,  we  would  expect  that 
the  forecasts  would  be  no  better  or  worse  than  without  the  inclusion 
of  the  additional  variable.  Therefore,  the  results  are  consistent 
with  the  hypothesis  that  the  factor  indices  provided  no  additional 
information  in  terms  of  forecasting  earnings  than  was  contained  in  the 
earnings  series.  In  addition  to  the  above,  due  to  the  generality  of 
the  MANOVA  test,  the  results  hold  across  each  step  ahead  in  the  fore- 
cast horizon  taken  by  itself. 

Finally,  note  that  in  the  event  that  the  null  hypothesis  had  been 
rejected,  a more  detailed  analysis  would  have  followed.  The  purpose 
of  this  analysis  would  have  been  to  isolate  the  reason  for  rejecting 
the  null  hypothesis.  There  are  several  methods  that  can  be  used  to 
conduct  a detailed  analysis.  One  such  approach  that  has  been 
recommended  in  the  literature  (Cooley  and  Lohnes,  1971,  p.  230;  Finn, 
1974,  p.  320)  is  to  inspect  individual  F ratios.  The  present 
discussion  is  limited  to  this  method. 

In  terms  of  the  above  design,  it  would  be  possible  to  use  F tests 
to  test  for  differences  between  rows  in  each  of  the  10  columns 


corresponding  to  the  10  steps  ahead.  For  the  F tests  that  were  re- 
jected one  might  construct  students'  T tests  to  make  pairwise  com- 
parisons of  means  within  each  of  the  10  columns  for  purposes  of 
isolating  differences  between  the  first  row  (ARIMA  forecast  errors) 
and  the  second  through  fourth  rows  (transfer  function  errors).  For 
example,  one  might  have  found  from  the  F tests  that  ^ ^ U2  -| 

^3,1  ^ *^4,1  ^ ^5,1  T tests  that  U-|  -j  ^ i* 

It  is  important  to  note  that  conclusions  resulting  from  this  pro- 
cedure should  be  drawn  with  care.  This  is  true  in  spite  of  the  fact 
that  this  procedure  often  is  recommended  in  the  literature  (Cooley  and 
Lohnes,  1971,  p.  230;  Finn,  1974,  p.  320)  because  the  detailed  tests 
are  still  not  independent.  This  is  because  rejection  of  the  null 
hypothesis  does  not  mean  that  the  follow  up  tests  are  independent, 
but  only  that  there  is  a justification  for  making  them.  The  net 
result  is  that  the  tests  should  be  regarded  as  valid,  however  the  a 
errors  should  be  considered  as  approximate. 


Notes 


Vhe  sample  is  fully  described  in  Appendix  1. 

2 

Net  worth  tolong  term  debt  could  not  be  computed  for  firms  7 and  21  due 
to  the  nonexistence  of  long  term  debt  for  a substantial  number  of  quarters. 

3 

Both  the  Dow  and  Industry  indices  were  computed  from  averaging  quarterly 
data  taken  from  Security  Owners  Stock  Guide  (Standard  and  Poor's  Cor- 
poration) . 

4 

The  stock  prices  were  adjusted  for  splits  and  dividends.  Also  for  some 
firms  a complete  series  of  stock  prices  was  not  available.  A list  of 
these  firms  is  presented  in  Appendix  2. 

5 

For  a detailed  discussion  of  earnings  per  share  see  Appendix  3. 

®See  Appendix  4 for  a description  of  the  linear  transfer  function  model. 

Initially  it  was  attempted  to  factor  a somewhat  larger  data  set  which 
included  all  of  the  items  that  were  used  to  compute  the  ratios  (i.e., 
total  assets,  net  worth,  etc.)  but  these  items  were  deleted  because  they 
were  so  highly  correlated  with  the  ratios  that  they  introduced  numerical 
problems. 

g 

Absolute  percentage  forecast  error  is  computed  by  the  formula: 


actual  EPS  - forecasted  EPS 
~ ~ actual  EPS 

9 

In  a relatively  small  number  of  cases  there  was  missing  data  due  to  missing 
actual  EPS  and  cases  where  the  absolute  percentage  forecast  error  was  not 
computed  due  to  zeros  in  the  denominators.  In  such  cases  observations  were 
replaced  by  their  cell  means.  A listing  of  these  cases  is  presented  in 
Appendix  6. 

^This  criterion  alone  is  discussed  because  it  is  probably  the  most  common 
and  well  known.  For  a discussion  of  the  various  MANOVA  statistics  see 
Morrison  (1967).  Note  that  all  of  the  tests  presented  resulted  in  approxi- 
mately the  same  significance  level. 


CHAPTER  III 


GENERAL  SUMMARY,  CONCLUSIONS,  IMPLICATIONS, 

LIMITATIONS,  AND  SUGGESTIONS  FOR 
FUTURE  RESEARCH 

Introduction 

In  recent  years  there  has  been  an  increased  emphasis  on  the  fore- 
casting of  earnings  per  share.  For  example,  the  SEC  has  considered  making 
such  forecasting  a formal  requirement  in  connection  with  external  re- 
porting. In  addition,  empirical  research  has  demonstrated  that  a high 
percent  of  financial  analysts  use  earnings  forecasts  in  their  decision 
making  (Norby,  1973;  Lorek,  et  al.,  1976). 

The  recent  emphasis  on  future  earnings  has  led  to  a wide  variety  of 
studies  utilizing  statistical  forecast  models  (SFMs).  It  was  demonstrated 
that  all  of  the  SFMs  employed  in  the  literature  ignored  data  that  could 
possibly  lead  to  better  forecasts  when  considered.  Examples  of  such  SFMs 
are  those  related  to  multiple  regression  and  the  autoregressive  inte- 
grated moving  average  models  (ARIMA).  The  ARIMA  models  are  not  capable 
of  considering  information  that  might  be  obtained  from  adding  other 
variables  to  the  analysis.  On  the  other  hand,  the  multiple  regression 
models  do  have  the  ability  to  model  more  than  one  variable  but  do  not  have 
the  capability  of  considering  time  series  properities. 

The  Problem  and  Purpose 

The  use  of  earnings  forecasts  in  decision  making  implies  that  the  use 
of  suboptimal  forecasts  can  lead  to  dysfunctional  decisions  or  undesirable 
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inferences.  Notable,  however,  is  that  the  specification  of  a SFM  depends 
on  the  data  set  used  in  its  estimation  and  application;  and  in  order  to 
select  an  optimal  SFM  the  forecastor  must  know  which  data  to  select.  In 
an  economic  sense  the  rational  forecastor  will  use  a "cost-benefit" 
criterion  in  the  selection  from  alternative  data  sets.  This  implies  that 
he  needs  to  know  the  value  of  his  alternative  data  sets.  This  valuation 
of  data  must  be  done  in  consideration  of  its  contribution  to  improving 
forecasti ng. 

The  purpose  of  the  present  study  is  to  develop  and  apply  a general  SFM 
to  the  problem  of  valuing  alternative  data  sets.  This  SFM  avoids  to  a 
high  degree  the  above  mentioned  limitations  of  SFMs  which  are  often  employed 
in  practice. 

Method 

Recent  research  has  given  a considerable  amount  of  attention  to 
ARIMA  forecast  models.  As  pointed  out  above,  these  models  use  only  the 
earnings  variable  and  ignore  other  possible  sources  of  useful  information 
for  prediction  such  as  ratio,  market  or  industry  data.  In  light  of  this 
omission  it  is  hypothesized  that  the  addition  of  variables  to  the  ARIMA 
models  will  result  in  improved  forecasts.  Specifically  the  null  hypothesis 
tested  is  stated  below. 

Null  Hypothesis 

Ho:  The  average  absolute  percentage  forecast  error  for  the 

bivariate  ARIMA  model  utilizing  composite  indices  taken 
from  ratio,  market  and  industry  data  in  addition  to 
earnings  is  equal  to  the  average  absolute  percentage 
forecast  error  based  on  a univariate  ARIMA  model  utilizing 
earnings  alone. 


Ha:  Ho  is  not  true. 


Note  that  the  key  factor  in  testing  the  null  hypothesis  is  the  use 
of  a bivariate  ARIMA  model  which  is  a simple  two  variable  generalization 
of  the  well  known  ARIMA  model  and  is  referred  to  as  a linear  transfer 
function  (fully  described  in  Appendix  4). 

Sample 

A sample  of  30  airlines  was  chosen.  The  airline  industry  was 
selected  because  the  CAB  requires  all  certified  air  carriers  to  file 
quarterly  income  statements  and  balance  sheets  which  makes  for  data  which 
is  obtainable. 

Test  of  the  Null  Hypothesis 

The  following  is  a four  step  summary  of  the  methodology  used  to  test 
the  null  hypothesis: 

St£p_J_:  The  internal  and  external  data  sets  were  pooled  and  jointly 

factor  analyzed  (using  orthogonal  rotation)  into  uncorrelated 
components.  The  resultant  factor  loadings  were  used  to  con- 
struct indices. 

Ste£_2:  (A)  The  above  indices  were  used  as  input  series  into  linear 

transfer  function  models.  The  transfer  function  models  were 
identified  and  estimated  for  each  firm  and  input  series  using 
approximately  50  quarters  of  data  (the  most  recent  10  quarters 
were  excluded  from  this  step  for  purposes  of  testing  the  null 
hypothesis) . 

(B)  The  transfer  functions  were  applied  to  forecasting  the  10 
quarters  omitted  in  (A)  above. 

Ste£_3:  ARIMA  models  for  adjusted  EPS  were  constructed  and  applied  to 

forecasting  as  in  Step  2. 


step  4:  For  each  step  ahead  mean  absolute  percentage  forecast  errors 


were  computed  by  factor  index  as  well  as  for  the  ARIMA  forecasts. 
Finally,  means  for  the  ARIMA  absolute  percentage  errors  were  compared  to 
the  same  transfer  function  means  by  using  a MANOVA  design.  ■ 

Resul ts 

Step  1 (factor  analysis)  revealed  that  four  factors  typically 
explained  approximately  80  percent  of  the  variation  in  the  original  data. 
For  this  reason,  four  factors  were  retained  for  each  firm.  These  factors 
were  then  applied  to  the  original  data  to  produce  four  factor  indices  for 
each  firm. 

In  steps  2 and  3,  transfer  functions  were  generated  for  each  of  the 
four  factors.  This  process  was  done  for  each  of  the  30  sample  firms  giving 
a total  of  120  transfer  functions.  In  addition,  ARIMA  models  were  con- 
structed for  each  of  the  30  firms  giving  a total  of  150  (120  + 30)  models. 
Finally,  10  forecasts  were  computed  for  each  model  giving  a total  of  1,500 
forecasts  for  analysis  (with  the  exception  of  a small  amount  of  missing 
data  discussed  above). 

In  step  4,  mean  absolute  percentage  forecasts  were  computed  by 
forecast  type  and  the  number  of  steps  ahead.  There  were  five  forecast 
types  corresponding  to  the  four  factor  indices  and  the  ARIMA  models. 

Finally,  the  mean  absolute  percentage  forecast  errors  were  put  into  a 
multivariate  analysis  of  variance  (MANOVA)  design  and  the  null  hypothesis 
was  tested.  It  was  found  that  none  of  the  transfer  function  forecasts 
forecasted  any  different  than  the  ARIMA  forecasts.  The  implications  and 
conclusions  are  discussed  in  the  next  section. 


General  Conclusion  and  Implications 

In  light  of  the  results  of  the  multivariate  analysis  of  variance 
summarized  in  the  previous  section,  the  conclusion  of  this  study  is  that 
composite  uncorrelated  indices  computed  from  ratio,  market  and  industry 
data  provide  no  additional  information  in  forecasting  earnings  per  share 
than  is  already  contained  in  the  earnings  series  itself.  The  results  of 
the  study  tends  to  shed  doubt  on  the  hypothesis  that  ARIMA  forecasts  can 
be  improved  by  considering  additional  data  in  the  internal  and  external 
data  sets. 

Limi tations 

Two  types  of  limitations  will  be  discussed:  (1)  those  relating  to 

the  modeling  process  and  (2)  those  relating  to  general izabi 1 ity. 

Modeling 

While  all  of  the  models  used  in  the  present  study  conform  to  what 
is  considered  to  be  acceptable  models,  it  must  be  considered  that  the 
statistical  tests  involved  are  subject  to  these  models  and  in 
particular  different  modeling  processes  might  lead  to  different  results. 
For  example,  future  statistical  research  might  produce  models  that 
generate  forecasts  that  are  more  accurate  than  those  generated  by  the 
models  used  in  this  study. 

General izabi 1 ity 

The  primary  limitations  of  the  study  are  the  particular  industry 
and  particular  ratios  chosen.  Each  will  be  discusssed  individually. 

(1)  Industry:  The  sample  of  firms  used  was  restricted  to  the  airline 

industry  which  is  subject  to  fare  regulation.^  It  might  be  the 
case  that  a similar  study  would  produce  different  results  in  an 
industry  that  is  not  subject  to  such  regulation. 


(2)  Ratios:  The  set  of  ratios  employed  in  the  study  were  selected  on 

the  basis  of  being  representative  of  those  used  in  practice.  It 
must  be  acknowledged,  however,  that  the  analysis  did  not  include 
1 possible  ratios  that  could  have  been  included.  For, example, 
many  other  ratios  such  as  depreciation  to  sales  or  sales  to  current 
assets  could  have  been  used. 

Suggestions  for  Future  Research 
Examine  other  industries 

The  primary  reason  for  restricting  the  study  to  the  airline  industry 
was  because  it  was  not  possible  to  obtain  a large  sample  (i.e.,  30  or 
more  firms)  in  other  industries.  However,  it  might  be  possible  to 
repeat  the  study  in  another  industry  using  a small  sample  (e.g.,  10  to  30 
firms) . 

Examine  other  ratios 

As  previously  discussed,  the  study  concentrated  on  those  ratios  which 
are  commonly  used  in  practice;  however,  it  would  be  possible  to  identify 
and  examine  those  ratios  that  have  been  discussed  in  the  literature,  but 
not  used  in  this  study. 

Develop  a contingency  theory 

It  might  be  possible,  using  economic  analysis,  to  develop  a theory 
that  specifies  under  what  specific  circumstances,  if  any,  that  a transfer 
function  model  for  earnings  should  outperform  an  ARIMA  model.  This  would 
involve  making  a set  of  assumptions  with  respect  to  economic  events  and 
individual  firm  behavior.  The  assumptions  would  be  expressed  in  mathe- 
matical terms  and  then  solved  for  the  ARIMA  and  transfer  function  forecast 
functions. 


Note 


It  should  be  noted  that  airline  firms  report  under  generally  accepted 
accounting  principles. 


APPENDIX  1 


DESCRIPTION  OF  THE  SAMPLE  EMPLOYED 

The  constraining  factor  in  sample  selection  was  meeting  the  above 

mentioned  information  requirements.  Random  sampling  was  not  possible 

because  only  30  firms  met  the  information  requirements  of  60  quarters  of 

income  statement  and  balance  sheet  information.^  Below  is  a list  of  these 
2 

firms. 


1.  Airlift  International 

2.  A1 aska  Airl ines 

3.  Aloha  Airlines 

4.  American  Airlines 

5.  Aspen  Airways 

6.  Braniff  Airways 

7.  Caribbean  Atlantic  Airlines 

8.  Continental  Airlines 

9.  Delta  Airl ines 

10.  Eastern  Airlines 

11.  Tiger  International  Airlines 

12.  Frontier  Airlines 

13.  Hawai ian  Airl ines 

14.  National  Airlines 

15.  New  York  Airways 

16.  North  Central  Airlines 

17.  North  West  Airlines 

18.  Ozark  Airl ines 

19.  Pan  American  Airways 

20.  Piedmont  Airlines 

21 . Reeve  Airl ines 

22.  SFO  Airlines 

23.  Seaboard  World  Airlines 

24.  Southern  Airways 

25.  Texan  International  Airlines 

26.  Trans  World  Airlines 

27.  UAL  (United  Airlines) 

28.  Western  Airlines 

29.  Wien  Airlines 

30.  Allegheney  Airlines 

Each  firm  will  be  subsequently  referred  to  by  the  identifying  number  that 
precedes  it. 
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APPENDIX  2 


LIST  OF  FIRMS  FOR  WHICH  STOCK  PRICE  WAS  NOT  INCLUDED 

Complete  series  of  stock  prices  were  not  available  for  all  firms. 
For  these  firms,  stock  price  was  not  included  in  the  variables  factored. 
The  following  is  a list  of  these  firms: 

1.  Airlift  International 

2.  Aloha  Ai rl ines 

3.  Aspen  Airways 

4.  Caribbean  Atlantic  Airlines 

5.  New  York  Airways 

6.  North  Central  Airlines 

7.  North  West  Airlines 

8.  Reeve  Airlines 

9.  SFO  Airways 

10.  Southern  Airways 

11.  Wien  Airways 
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APPENDIX  3 


DESCRIPTION  OF  EARNINGS  PER  SHARE 


Earnings  per  share  were  taken  from  Moody's  Handbook  of  common 
stocks.  For  those  firms  which  were  not  reported  in  Moody's  the  EPS 

3 

were  computed  using  information  from  schedule  B-3  of  Civil  Aero- 


nautics Board  form  41  in  conjunction  with  the  Civil  Aeronautics  Board 
quarterly  periodical  Air  Carrier  Financial  Statistics.  In  all  cases 
the  EPS  were  adjusted  for  stock  splits  and  stock  dividends.  Below 
is  a list  of  those  firms  for  which  the  EPS  had  to  be  computed. 

1 . Ai rl i ft  Ai rl i nes 

2.  Alaska  Airways 

3.  Aloha  Airl ines 

4.  Aspen  Airways 

5.  Caribbean  Atlantic  Airlines 

6.  Frontier  Airlines 

7.  Hawaiian  Airlines 

8.  New  York  Airways 

9.  North  Central  Airlines 

10.  Ozark  Airl ine 

1 1 . Piedmont  Airl ines 

12.  Reeve  Airlines 

13.  SFO  Airl ines 

14.  Southern  Airways 

15.  Texan  International  Airlines 

16.  Wien  Airl ines 
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APPENDIX  4 


DESCRIPTION  OF  THE  LINEAR  TRANSFER  FUNCTION  MODEL 

The  linear  transfer  function  describes  a statistical  relationship 
between  two  time  series  (an  output  and  input  series).  In  the  context  of 
the  present  study  a graphical  example  is  given: 


(combined  internal 
and  external  data 
series) 


Mathematically  discrete  dynamic  systems  are  often  parsimoniously 
represented  by  the  general  linear  difference  equation: 

(1)  (1  + e-jV  + + e^V*^)  = g(l  + n-j^  + ....n^V^)  which  may 

be  written  in  terms  of  the  backward  shift  operator  B = 1 - V as 

(2)  (1  - - ....  - = (W^  - W^B  - ...  - W5B^)  X^_t,  or 

equivalently,  writing'^  fl(B)  = W(B)  B^ 

(3)  the  model  becomes  a(B)Y^  = J^(B)X^ 

(4)  and^  Y^  = a'^B)  n (B)X^  and 

(5)  a ^(B)  Q (B)  is  the  transfer  function. 

In  practice  the  system  (4)  will  be  infected  with  noise  whose  net 
effect  is  to  corrupt  the  output  Y^  of  the  transfer  function  by  an  amount 
N|-  which  is  a modelable  noise  component  (i.e.,  the  residuals  in  (4)  are 
modeled  by  an  ARIMA  orocess).  In  addition  Y^  might  be  subject  to 
deterministic  drift  by  an  amount  9. 
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Finally  letting  N^.  = (})'^(B)o(B)  at  where  (J)(B)  and  9(B)  are  auto- 
regressive and  moving  average  operators  respectively  (with  a^  white 
noise)  the  complete  process  becomes^ 

(6)  Y®  = [a'^B)Q(B)X^  + 0q]  + (},'''(B)0(B)a^ 

Statistically  the  problem  is  to  estimate  from  (6): 

(A)  The  parameters  in  6(B)  (generally  referred  to  as  output  lag 
parameters) . 

(B)  The  parameters  in  S7(B)  (generally  referred  to  as  input  lag 
parameters) . 

(C)  The  parameters  in  (J)(B)  and  0(B)  which  relate  respectively  to  the 
univariate  autoregressive  and  moving  average  factors^  used  to 
transform  to  white  noise  a^  (in  general  (J)"Vb)0(B)  is 
referred  to  as  the  noise  model). 

This  estimation  is  generally  accomplished  by  minimizing 

k 2 

T a^  (o(B),  fi(B),  0^,  (j)(B),  0(B)),  (k  = number  a^  available)  which  will 

result  in  six  types  of  noise  model  parameters  and  three  types  of  transfer 

function  parameters  which  will  be  referred  to  as  Type  1,  Type  2,  etc.: 

Noise  model  parameters 

Type  (1)  ordinary  autoregressive 
Type  (2)  seasonal  autoregressive 

Type  (3)  possible  series  mean  (in  the  stationary  case) 

Tyne  (4)  possible  deterministic  trend 
Type  (5)  ordinary  moving  average 
Type  (6)  seasonal  moving  average 

Transfer  function  parameters 

Type  (7)  output  lag 

Type  (8)  possible  deterministic  trend  constant 
Type  (9)  input  lag 


In  addition  to  completely  specifying  the  system  (6),  the  value  of 
b in  (2)  must  be  known. 

The  estimation  of  the  9 types  of  parameters  can  be  efficient  only 
if  the  form  of  the  system  is  known. ^ This  stage  of  the  analysis  is  known 
as  identification^  and  is  generally  accomplished  by  using  the  method  of 
prewhitening  which  involves  examining  the  cross  correlation  function 
between  the  input  series  which  has  been  transformed  to  white  noise  and 
the  correspondingly  transformed  output  series. Finally,  diagnostic 
checks  can  be  performed  by  examining  the  autocorrelation  function  of  the 
residuals  a^  as  well  as  the  cross  correlation  function  between  the  trans- 
formed input  series  and  the  residuals. 

The  process  is  iterative  and  if  the  diagnostic  checks  for  model 
adequacy  fail  there  is  reidentification  and  re-estimation  until  the 
diagnostic  checks  are  passed. 


APPENDIX  5 


APPLICATION  OF  THE  TRANSFER  FUNCTION 
AND  UNIVARIATE  MODELS 

This  appendix  presents  the  results  of  transfer  and  univariate  ARIMA 
modeling.  Specifically,  section  one  discusses  the  bivariate  models  and 
section  two  discusses  the  univariate  models.  The  models  themselves  follow 
section  two. 

Section  One 

For  the  30  sample  firms  transfer  and  noise  models  were  developed^! 
according  to  the  procedures  discussed  in  Appendix  4.  The  final  models 
and  parameters  are  presented  below.  The  models  are  presented  in 
abbreviated  form,  therefore  as  an  example  the  results  for  firm  1 will  be 
discussed  line  by  line. 

• • • Rosults  for  firm  1 . . . ."  This  line  is  self  explanatory. 
Line  2:  "models  (1  000002  1 1)"  Each  of  the  d nu>Tibe>"s  in  this 

vector  corresponds  to  the  number  of  parameters  in  each  parameters 
1 2 

type.  Recall  from  Appendix  4 that  the  9 parameters  types  are 

Noise  model  parameters 

Type  (1)  ordinary  autoregressive 
Type  (2)  seasonal  autoregressive 
Type  (3)  possible  series  mean 
Type  (4)  possible  deterministic  trend 
Type  (5)  ordinary  moving  average 
Type  (6)  seasonal  moving  average 
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Transfer  function  parameters 
Type  (7)  input  lag 

Type  (8)  possible  deterministic  trend 
Type  (9)  output  lag 

In  the  present  case  there  is  one  type  1 parameter,  two  type  7 parameters, 
one  type  8 parameter  and  one  type  9 parameter.  Next 
1 (from  8=1)  and  in  "x  DIFF  = (0,0)"  the  first  0 states  that  there  is  no 
regular  differencing  on  the  input  series  while  the  second  0 signifies 


that  there  is  no  quarterly  seasonal  differencing  on  the  input  series. 

In  both  cases  I's  would  have  implied  that  differencing  existed. The 
coding  is  identical  for  the  Y(output)  series. 

"101  701  702  800  900"  In  each  of  these  numbers  the  first  digit 
refers  to  the  parameter  type  and  the  second  two  digits  refer  to 
the  order  of  the  parameter. In  the  present  case  the  101 
means  a type  1 parameter  of  order  01 , the  701  means  a type  7 
parameter  of  order  01,  the  702  means  a type  7 parameter  of 
order  02,  the  800  means  a type  8 parameter  of  order  00,  and 
finally  the  900  means  a type  9 parameter  of  order  00. 

UjTe_4:  This  line  gives  the  values  of  the  parameters  immediately  below 

their  descriptors  on  line  3.  For  example  parameter  101  has  the 
value  .662070+00  which  is  in  standard  Fortran  0 format. 


Section  Two 

This  section  presents  the  results  of  univariate  modeling  on  the 
adjusted  EPS.  The  results  are  presented  in  the  same  format  as  those 
for  the  noise  model  in  Section  1 above. 


***RESULTS  OF  TRANSFER  MODELING  WITH  AIRLINE  DATA*** 
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APPENDIX  6 


DESCRIPTION  OF  AVAILABLE  DATA  FOR 
FORECAST  ERROR  ANALYSIS 


This  appendix  gives  a firm  by  firm  description  of  the  number  of 
quarters  of  data  available  for  forecast  error  analysis.  For  each  firm 
the  number  of  periods  in  the  base  period,  the  origin  date  for  fore- 
casting, and  the  number  of  absolute  percentage  forecast  errors  are 


presented. 

firm 

number  of  periods 

origin  date  for 

number  of 
for  which 

number 

in  base  period 

forecasting 

percentage 

1 

50 

4/74 

error  was  i 
10 

2 

49 

4/74 

10 

3 

50 

4/74 

10 

4 

50 

4/74 

10 

5 

30 

4/74 

10 

6 

30 

3/74 

8 

7 

40 

4/71 

9 

8 

50 

4/74 

7 

9 

50 

4/74 

10 

10 

50 

4/74 

10 

11 

50 

4/74 

10 

12 

50 

4/74 

10 

13 

49 

4/74 

10 

14 

50 

4/74 

10 

1 5 

50 

4/74 

10 

1 6 

50 

4/74 

10 

17 

50 

4/74 

10 

18 

50 

4/74 

10 

19 

50 

4/74 

10 

20 

50 

4/74 

10 

21 

50 

4/74 

10 

22 

42 

2/74 

10 

23 

50 

4/74 

10 

24 

50 

4/74 

10 

75 


25 

50 

26 

50 

27 

50 

28 

50 

29 

30 

30 

50 

4/74 

10 

4/74 

10 

4/74 

10 

4/74 

10 

2/76 

2 

4/74 

10 

For  example  in  the  case  of  firm  1,  50  quarters  of  data  were  used  in 
transfer  and  univariate  estimation  and  actual  and  predicted  forecasts 
were  computed  over  a 10  period  forecast  horizon  with  the  first  fore- 
cast being  for  the  first  quarter  of  1974. 


NOTES  TO  APPENDICES 


A few  of  the  firms  selected  did  not  exactly  meet  these  criteria  but  were 
close  enough  to  be  accepted.  A description  of  the  data  available  for 
each  firm  accepted  are  presented  in  Appendix  6. 

2 

The  data  was  taken  from  Air  Carrier  Financial  Statistics  which  is  a Civil 
Aeronautics  Board  quarterly  publication. 

3 

The  computations  were  based  on  the  number  of  common  shares  outstanding 
at  the  end  of  the  quarter  and  adjusted  for  common  stock  equivalents  havinq 
dilutive  effects. 

4 

b represents  the  delay  time  between  the  input  and  output  series.  For 
For  example  the  model  = W X . + WnX.  has  b=4  with  four  periods  of 
delay.  More  preciselystated  B Ts  the  length  of  time  that  occurs  between 
a perturbation  in  the  input  series  and  a response  in  the  output  series. 

5 

Note  that  if  X^  is  white  noise  then  (4)  collapses  to  an  ARIMA  model. 

^Here  Y^  and  X^  have  been  replaced  by  their  mean  deviations  if  stationary 
or  repTaced  by  their  differenced  series  if  nonstationary. 

^Ordinary  and  seasonal  parameters  are  employed  as  in  the  Box  and  Jenkins 
multiplicative  model  (Box  and  Jenkins,  Chapter  9). 

g 

To  know  the  form  of  the  dynamic  system  means  to  know  how  many  of  each 
of  thenine  ^pes  of  parameters  need  to  be  used  to  describe  that  system. 

In  addition  it  is  convenient  to  know  the  value  of  b the  delay  parameter. 

9 

The  modeling  process  for  the  bivariate  case  is  similar  to  the  univariate 
case  in  that  the  three  phases  of  identification,  estimation,  and 
diagnosing  are  employed. 

^*^The  same  ARIMA  model  that  transforms  the  input  series  to  white  noise  is 
applied  to  the  output  series.  For  a more  detailed  description  see  Box 
and  Jenkins  (1970),  Chapter  9. 

^Vhe  computer  programs  used  were  those  originally  written  by  David  Pack 
of  Ohio  State  University  and  modified  for  local  use  at  the  University 
of  Illinois  by  James  McKeowen.  The  McKeowen  version  automatically  passes 
through  the  steps  of  identification  estimation,  diagnosing  and  fore- 
casting. The  diagnostic  phase  assures  that  all  models  employed  meet  the 
standard  criteria  for  acceptable  models  as  discussed  in  Appendix  4. 


77 


12 

For  a complete  description  of  each  type  of  parameter  see  Appendix  4. 

1 3 

For  example,  "x  DIFF  = (0,1)"  would  mean  seasonal  differencing  only, 

"x  Diff  = (1,0)"  would  mean  regular  differencing  only,  and  "x  DIFF  = 

(1,1)"  would  mean  both  types  of  differencing. 

1 5 

The  order  of  a parameter  is  the  lag  of  that  parameter  or  more  precisely 
stated  it  is  the  power  of  the  backwards  operator  that  multiplies  a 
particular  parameter.  For  example  in  the  equation  Y+  = (W^  - WiB  - WoB'^) 
^t-b''^Q  0 (B°  = 1),  W-|  is  order  1 and  W2  is  of  order  2. 

Note  that  the  mean,  trend  and  first  input  lag  parameters  will  have  order  0. 

^^The  D+000  means  shift  the  decimal  place  0 places  to  the  right.  For 
example  . 27001 lD+02  means  27.001  and  27001 lD-02  means  .0027001. 

^^The  number  before  the  slash  represents  the  quarter  and  the  number  after 
the  slash  represents  the  year.  For  example,  for  firm  1 the  origin  date  for 
forecasting  was  4/74  or  the  fourth  quarter  of  1974. 
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